---
title: "Replication for running the code for Populism and Political Parties Expert Survey v2"
author: "Robert A. Huber, Maurits Meijers, and Andrej Zaslove"
output:
  rmdformats::downcute:
    use_bookdown: true
    self_contained: true
    thumbnails: false
    lightbox: true
    gallery: false
    highlight: monochrome
editor_options:
  chunk_output_type: console
---

```{r globopt, include = F}

knitr::opts_chunk$set(warning = FALSE)
options(savehistory = FALSE)

```

<br>
<br>

```{css, echo=FALSE}

.red-box {
  border: 1px solid red;
  padding: 10px;
  background-color: white;
  color: red;
  font-size: 1em;
}

```

<div class="red-box">

Please note that the following variables are currently under embargo. These items will be released shortly:

* democracy
* positionchange
* compromise

</div>

# How to proceed

All steps required to replicate the results in `R` can be run from the file `master_replication_v2.R`. This file loads various individual scripts, process it on various levels, and creates a dataset outputs.

# Prepare Replication Data

After downloading our replication material from <http://poppa-data.eu/> or the Harvard Dataverse (<https://doi.org/10.7910/DVN/RMQREQ>), please open the file `POPPA_Replication_v2.Rproj`. You need to have installed `R` (version 4.4x) and `RStudio` for this. After opening the `.Rproj` file in `RStudio`, please open the file `master_replication_v2.R` within the project. This script can be found in the subfolder `rcode_v2`. In the next steps, we describe each of the steps of the `master_replication_v2.R` file.

# Install/Load Packages

The following code installs all packages (if required). After that it loads the packages. The code then uses the `here` package to set the working directory. Runtime of this code depends on the number of packages that you have already installed. However, even when you have no packages installed, it should only take a few minutes.

```{r pkgload}

run_start_total <- Sys.time()
run_start <- Sys.time()

pkgs <- c("here",
          "tidyverse",
          "naniar",
          "dataverse",
          "lavaan",
          "conflicted")

# Function to check if packages are installed
# If not: package will be installed from CRAN and then loaded
# If: Package will be loaded

install_load <- function(packages){
  
  for (p in packages) {
    cat("Check package: '", p, "'...\n", sep = "")
    flush.console()
    
    if (p %in% rownames(installed.packages())) {
      
      cat("Package: '", p, "' is already installed...\n\n", sep = "")
      flush.console()
      
      library(p, character.only=TRUE)
      
    } else {
      
      cat("Package: '", p, "' is NOT installed! Will install now...\n\n", sep = "")
      install.packages(p)
      library(p,character.only = TRUE)
      
    }
  }
  cat("\nAll packages installed!\n\n")
}

# Apply function to all required packages

install_load(pkgs)

# Set wd with here() package

here::i_am("POPPA_Replication_v2.Rproj")

run_stop <- Sys.time()
run_time <- (run_stop - run_start)
run_time

```

# Download the data

The data has been downloaded from Qualtrics. We cannot share the data for privacy reasons. We cannot share the underlying API key and the original data contains private information from the respondents. 

# Clean raw data

The script `clean_rawdata_v2.R` in the subfolder `rcode_v2` loads the raw survey data (`df_list_v2.rds`). All *Don't knows* are recoded as `NA` and it cleans this data and removes minor inconsistencies in variable names. 

```{r, include=T}

run_start <- Sys.time()

source("rcode_v2/cleaning_rawdata_v2.R", echo = T)

run_stop <- Sys.time()
run_time <- (run_stop - run_start)
run_time

```


# Creating the expert dataset

The script `expert_dataset_v2.R` in the subfolder `rcode_v2` creates the expert dataset based on the individual survey responses. Line 51 in `expert_dataset_v2.R` allows users to set the minimum number of valid responses. We recommend 4 and our core publication sets all party level variables NA, if we have fewer than 4 experts.

```{r, include=T}

run_start <- Sys.time()

source("rcode_v2/expert_dataset_v2.R", echo = T)

run_stop <- Sys.time()
run_time <- (run_stop - run_start)
run_time

```

# Party level dataset

This script `party_dataset_v2.R` in the subfolder `rcode_v2` aggregates the expert level data to party scores, adds additional pieces of information on the parties and saves it as `poppa2_v2.rds` in the subfolder `final_data_v2`.

```{r, include=T}

run_start <- Sys.time()

source("rcode_v2/party_dataset_v2.R", echo = T)

run_stop <- Sys.time()
run_time <- (run_stop - run_start)
run_time

```

# Create an integrated dataset that contains both POPPA waves

This script `combine_waves_v2.R` in the subfolder `rcode_v2` downloads POPPA wave 1 from the Harvard Dataverse (<https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/8NEL7B>) and combines it with the second wave saved as `poppa2_v2.rds` in the subfolder `final_data_v2`. It thereby provides a series of datasets. First, it expands the POPPA wave 1 dataset and adds additional meta-data to match the second wave of POPPA. This data is saved as `poppa1.rds` in the subfolder `final_data_v2`. We keep all variable names constant to allow for backwards compatibility. We, second, update the variable names in the POPPA wave 1 dataset to match POPPA wave 2 and combine it with the second wave of POPPA. This integrated dataset is then saved as `poppa_integrated_v2.rds` in the subfolder `final_data_v2`.

Third, we run a CFA for the populism variable and we attach the factor scores to the dataset. We provide a variable with original factor scores (populism_cfa) and we provide a rescaled variable (populism_cfa_rescaled), rescaled from 0 -- 10. The following paper introduces the data and it has more detailed description of the process for constructing the CFA and the populism variables: *The State of Populism: Introducing the 2023 Wave of the Populism and Political Parties Expert Survey*. We also have Quarto file that reproduces the steps for the CFA (See folder: `replication_code_cfa_variable_v2` and file `cfa_proofs_populism_latent_variable_v2.qmd`). As noted earlier, this version of the dataset includes the missing party for Cyprus. There are minor differences regarding the CFA populism variables and the populism mean variable. However, the differences are very minor.

```{r, include=T}
run_start <- Sys.time()

source("rcode_v2/combine_waves_v2.R", echo = T)

run_stop <- Sys.time()
run_time <- (run_stop - run_start)
run_time
```

# Create STATA labels.

This script `POPPA - Variable Names and Labels.do` in the subfolder `stata_dofile_v2` creates a STATA labels for the `poppa_integrated_v2.dta`.

# List of files

The following files and folders are included:

```{r}
#List of files in the main directory
list.files()

#List of original data files
list.files("./original_data_v2/")

#List of rcode files
list.files("./rcode_v2/")

#List of CFA-related files
list.files("./replication_code_cfa_variable_v2/")

#List of Stata files
list.files("./stata_dofiles_v2/")
```

# Session Info

This notebook was run using the following setup:

```{r}

pander::pander(sessionInfo())

run_stop_total <- Sys.time()
run_time_total <- (run_stop_total - run_start_total)
run_time_total

```
